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Dedication 


This  paper  is  dedicated  to  the  memory  of  Gwilym  M.  Jenkins. 
The  contributions  to  time  series  analysis  of  Gwilym  M.  Jenkins 
(1932-1982)  will  always  be  embedded  deeply  into  the  field.  His 
work  (especially  joint  work  with  George  Box)  has  influence  in 
diverse  fields  of  science.  I  was  fortunate  to  come  to  know 
Gwilym  early  in  my  career,  on  a  visit  to  London  in  1958.  He 
spent  1959-1960  with  me  at  Stanford  and  I  spent  1961-1962  with 
him  at  Imperial  College.  He  earned  the  respect  and  affection 
of  all  who  knew  him  or  his  work.  His  life  and  work  was  heroic. 
As  we  contemplate  the  sadness  of  his  death  so  young,  may  we 
continue  to  enjoy  his  spirit. 


1.  FUN.STAT  approach  to  time  series  model  identification 


The  need  to  analyze  data  arising  in  the  form  of  time  series 
arises  in  diverse  fields .  The  concept  of  a  conventional  analysis 
is  not  the  same  in  each  field.  Engineers  tend  to  estimate  mean, 
variance,  and  spectrum  (which  may  be  regarded  as  a  non-parametric 
signature  of  models) .  Economists  and  forecasters  tend  to 
estimate  mean,  variance,  and  time  domain  models  such  as  ARMA  or 
ARIMA  (which  are  parametric  models) .  Spectral  and  ARMA 
estimation  are  not  routine  procedures;  there  are  many  algorithms 
for  spectral  estimation  and  time  domain  model  identification. 

In  addition  there  are  critics  of  spectral  and  correlation 
based  methods  of  time  series  analysis,  of  whom  the  most 
prominent  is  Mandelbrot  (1982) .  This  paper  describes  an 
approach  to  time  series  analysis  which  attempts  to  use  diverse 
methods  of  analysis  simultaneously  in  order  to  meet  the  needs 
of  all  the  fields  of  applications  of  time  series  analysis. 

It  also  aims  to  integrate  spectral  and  correlation  methods 
with  methods  for  long  memory  and/or  long  tailed  time  series. 

An  approach  to  spectral  analysis  and  time  domain  modeling 
of  time  series  is  described  in  Parzen  (1979),  (1930),  (1981), 
(1982),  (1983a),  (1983b),  (1983c).  An  approach  (motivated  by 
time  series  methods)  to  statistical  data  analysis  of  probability 
distributions  is  described  in  Parzen  (1979) ,  (1982) ,  (1983a) , 
1983b),  (1983c),  (1983d);  it  is  called  the  Quantile  Data  Analysis 
and  FUN.STAT  approach,  to  connote  that  it  is  based  on  functional 


statistical  inference,  entropy  and  information  measures,  and 
quantile  and  density  quantile  approach. 
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Parzen  (1980)  states  that  "a  criterion  that  any  general 
time  series  modeling  strategy  must  fulfill  is  that  its 
conceptual  framework  should  provide  a  role  for  the  continuing 
quest  for  a  time  series  decomposition.  . . .  Thus  it  seems 
critical  that  a  successful  approach  to  time  series  modeling 
employ  simultaneously  both  the  spectral  domain  and  the  time 
domain. "  This  paper  discusses  the  enhanced  insight  to  be 
obtained  by  also  employing  simultaneously  the  quantile  domain 
and  the  information  domain. 

This  paper  discusses  how  to  add  to  our  approach  to  time 
series  model  identification  new  diagnostic  measures,  based  on 
quantile  data  analysis  of  spectral  density  function,  and 
information  measures.  The  approach  implemented  in  our  time 
series  computer  program  library  TIMESBOARD  is  called  ARSPID 
(for  autoregressive  spectral  identification) .  The  "enhanced" 
approach  could  be  called  ARSPIQ  (for  autoregressive  spectral 
information  quantile  identification). 

In  empirical  time  series  analysis  a  central  role  in  model 
identification  is  the  concept  of  memory  [see  Parzen  (1981)] 
which  yields  a  classification  of  a  time  series  into  one  of  the 
following  three  classes: 

no  memory  =  white  noise 

short  memory  =  stationary  ergodic  but  not  white  noise 

long  memory  =  trends,  seasonal  cycles,  long  cycles, 

non-stationary 


When  a  time  series  is  classified  as  no  memory  (white  noise) , 
it  requires  no  further  analysis  (except  for  quantile 
identification  of  its  probability  distribution) . 

When  a  time  series  is  classified  as  a  short  memory  time 
series,  it  is  described  (parametrised)  by  ARMA(p,q)  schemes 
that  transform  it  to  white  noise.  The  orders  p  and  q  are  not 
measures  of  the  length  of  memory. 

When  a  time  series  is  classified  as  a  long  memory  time 
series  it  is  described  (parametrised)  by  operators  which 
transform  it  to  a  short  memory  time  series. 

To  describe  the  dependence  structure  of  a  time  series  one 
introduces  quantitative  indices  which  are  non-parametric 
statistics  guiding  our  choice  of  parametric  models. 

An  AR>IA  model  (which  is  a  finite  parameter  time  domain 
model)  is  a  parametric  description  of  the  dependence  structure 
of  a  short  memory  time  series.  A  nonparametric  description  of 
its  dependence  structure  is  provided  by  the  spectral  density 
function  from  which  one  can  deduce  "significant  frequencies” 

(at  which  the  spectral  density  has  local  maxima) . 

The  operations  which  transform  a  long  memory  time  series 
to  a  short  memory  one  (or  which  represent  a  long  memory  time 
series  in  terms  of  a  short  memory  one)  can  be  considered  a 
parametric  time  domain  model.  Nonparametric  descriptions  of 
long  memory  properties  are  introduced  in  this  paper  in  terms  of 
the  index  of  regular  variation  of  the  spectral  density  at  a 


specified  frequency,  usually  zero  frequency. 


2 .  Quantile  identification  of  probability  distributions 


To  identify  probability  distribution  that  fit  a  time  series 
sample  Y(t),  t=l, . . . ,T,  one  treats  the  sample  as  a  data  batch 

X1 . V 

For  a  data  batch  X^, . . . ,X  one  can  define  the  sample 
distribution  function  F(x),  -<»<x<o°,  defined  by 

F(x)  =  fraction  of  X^ . Xn  which  are  <  x, 

and  the  sample  quantile  function  Q(u),  0£U£l ,  defined  by 

Q(u)  =  F  ^(u)  =  inf  {x:  F(x)  u} 

Quick  and  dirty  insight  into  the  distributions  that  fit  the 
univariate  distribution  function  F  is  provided  by  a  plot  of 
the  sample  informative  quantile  function 

IQ (u)  =  — -  -  ,  0<u<l  . 

2 {Q (0 . 75)  -  Q (0.25) } 

The  IQ  function  is  plotted  with  a  vertical  scale  from  -1 
to  1;  its  values  are  truncated  when  they  exceed  +1.  For  ease 
of  interpretation  of  the  IQ  funption,  we  also  plot  the  IQ 
function  of  the  uniform  distribution  which  is  a  straight  line 
passing  through  (0,  -.5)  and  (1,  .5). 
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The  distribution  functions  F (x)  that  we  seek  to  fit  to 
the  data  are  usually  of  the  forn 

F«  '  FoO 

for  parameters  p  and  o  to  be  estimated,  and  Fq(x)  a  known 
distribution  function.  The  most  important  cases  of  Fq(x)  are: 

x 

normal  Fq(x)  =  4>(x)  =  /  4>(y)  dy 

—  oo 

<ji(y)  =  (2ir)  ~  exp  -  j  y2 
exponential  Fq(x)  =  1  "  e  X  >  x>0 

One  can  test  (before  parameter  estimation)  the  goodness  of  fit 

X—  li 

of  F(x)  to  F(x)  *  Fq(-^)  by  introducing  the  weighted  spacings 

d(u)  =  4-  f  Q_(u)  q(u) 
a  o 

where:  fQQo  (u)  =  fQ(Fo^(u))  is  the  density-quantile  function 

of  the  specified  distribution;  q(u)  =  Q' (u)  is  the  sample 
quantile  density  function  (expressible  in  terms  of  spacings,  or 
differences  of  successive  order  statistics) ;  and 

°o  =  /o  foQo(u)  ^(u)  du 


1 


is  an  estimator  of  a  called  the  score  deviation.  The  test 


function  is  the  cumulative  weighted  spacings  function 
D(u)  =  d(t)  dt,  0_<u£l 

which  one  compares  with  the  uniform  distribution  D(u)  =  u. 

To  test  for  exponentiality ,  take  fQQo(u)  =  1  -  u.  The 
diagnostic  function  D(u)  will  appear  linear  when  the  data  is 
exponential.  In  the  important  case  of  a  mixture  distribution, 

[that  is,  the  lower  order  statistics  represent  values  from  an 
exponentially  distributed  sub-population] ,  D(u)  will  be  linear 
over  an  initial  interval  O^u^p .  When  the  data  batch  is  the 
sample  spectral  density,  the  value  p  estimates  the  proportion 
of  the  total  power  which  is  white  noise. 

Diagnostic  measures  of  time  series  parameters  [the  sample 
spectral  density  and  correlogram]  are  provided  by  plots  of 
suitable  IQ(u)  and  D(u)  functions.  Examples  of  their  power  as 
discriminators  of  memory  are  given  in  Section  7. 

Quantile  Data  Analysis  of  Sample  Spectral  Density 

When  the  sample  mean  Y  is  large,  it  is  necessary  to  transform 
Y(t)  to  Y(t)  -  Y;  otherwise  one  would  always  obtain  a  diagnostic 
that  Y(*)  is  a  long  memory  time  series.  An  alternative  first 
step  in  time  series  analysis  is  to  replace  Y(t)  by 


(Y (t)  -  Q(0 . 5) }  t  2 {Q(0 . 75)  -  Q(0.25)} 
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When  Y(t)  is  a  pre-processed  time  series  (from  the  sample, 
the  mean  or  median  has  been  subtracted)  one  computes  the  sample 
Fourier  transform 

T 

=  l  Y(t)  exp  (-2TTiu)t) 
t=l 

at  an  equi-spaced  grid  of  frequencies  in  0£to£l  of  the  form 
a)  =  k/S,  k=0,l,...,S  -  1.  We  call  S  the  spectral  computation 
number;  one  should  choose  S  :>  T  +  H,  where  M  is  the  maximum 
lag  at  which  one  computes  sample  correlations  p(v). 

The  sample  spectral  density  f  (w)  ,  0_<to£l,  is  computed  at 
a)  =  k/S  by  squaring  and  normalizing  the  sample  Fourier  transform: 

f<«)  ==  |i(w)|2  *  |  S'l  |i(|)|2 

b  k=0  b 

The  classification  of  the  time  series  as  no  memory  (or 
white  noise)  is  equivalent  to  the  random  variables  representing 
the  values  of  the  sample  spectral  density 

f(w),  w=k/S  k=l . [S/2] 

having  the  property  that  they  are  asymptotically  independent 
and  exponentially  distributed.  Therefore  tests  for  white  noise 
can  be  obtained  by  quantile  data  analysis  based  tests  for 
exponentiallty  of  the  sample  spectral  density  f  (00)  at  suitable 
frequencies . 


***  1c 

The  data  batch  f(g-),  k=0,  l,...,S/2,  is  tested  for 
exponentiality  by  forming  its  informative  quantile  function 
IQ(u)  and  its  cumulative  weighted  spacings  function  D(u) ,  with 
foQo(u)  =  1-u.  How  one  interprets  the  quantile  data  analysis  of 
the  sample  spectral  density  (periodogram)  is  best  illustrated  by 
examples . 
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3.  Correlation  diagnostics  for  model  memory  identification 

The  time  series  analyst  seeks  to  develop  for  an  observed 
sample  time  series  Y(t),  t»l,2,...,T  of  a  time  series  Y(t), 
t=0,  +1,  . . .  various  functions  that  can  be  estimated  and  plotted 
which  provide  insight  into,  and  diagnostic  measures  of,  possible 
models  that  fit  the  observed  time  series. 

Schuster  (1898)  pioneered  techniques  of  spectral  analysis. 
To  detect  hidden  periodicities,  Schuster  proposed  calculating 
what  we  today  call  the  sample  unnormalized  spectral  density  or 
periodogram 

T 

fT(w)  =  i  |  l  Y (t)  exp  (-27ritw)|2  ,  -0.5<u,<0.5. 
t=l 

One  actually  computes  and  plots  fT(o))  at  an  equi-spaced 
grid  of  frequencies  ai^=k/S,  k=0,l,...,  S-l,  where  S  is  the 
spectral  computation  number.  Using  the  Fast  Fourier  Transform, 
one  chooses  T<S<2T. 

The  graph  of  f^.(u))  is  a  very  wiggly  function.  If  one 
interprets  local  maxima  of  f^Cw)  as  indicating  "significant 
frequencies"  representing  "hidden  periodicities"  one  obtains 
many  spurious  periodicities. 

The  notion  of  the  spectral  density  f(w)  of  a  time  series 
Y(t),  t=0,  +1,  ...  is  defined  heuristically  by 


i 


f(oj)  =  lim  fT(w) 
T->®  1 


t 
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If  the  limit  existed  one  might  call  f(u>)  the  asymptotic  spectral 
density  of  the  time  series.  However  the  limit  does  not  exist  in 
any  customary  mode  of  convergence. 

Wiener  (1930)  proposed  solving  the  harmonic  analysis 
problem  by  defining  the  imple  covariance  function  R^,(v)  which 
equals  the  Fourier  transform  of  fT(uj): 

Mv>  "  b  T  Y(t+v)  Y(t)  ,  v  =  0,  1 . T-l 

t=l 


=  0 


,  v  >  T, 


—  Rj.(-v) 


,  v  <  0 


0.5 

R_,(v)  =  /  exp  (2Trivw)  fT(o)> 
-0.5 


The  limit  whose  existence  needs  to  be  ^isumed  is 


R(v)  =  lim  lUCv) 
T-*-« 


one  calls  R(v)  the  asymptotic  covariance  function  of  the  time 
series.  One  calls 


the  asymptotic  correlation  function;  it  is  the  limit  of  the  sample 
correlation  function 
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RfCv) 
pT(v)  “  Rp(0) 

The  sample  correlation  function  p^,(v)  is  an  important 
building  block  for  methods  of  model  identification.  Its  plot 
is  called  the  correlogram.  One  could  test  for  white  noise  by 
testing  whether  p^(v) ,  v=l,2,..,  N  constitute  a  random  normal 
data  batch. 

The  cumulative  periodogram 
Ft<«)  -  /“  fT<w’)  dw 

is  a  diagnostic  tool  for  providing  evidence  of  hidden 
periodicities.  If  it  converges,  its  limit  function  F(u) 
provides  a  spectral  representation  of  R(v) : 

R(v)  =  exp  2  Trivia  dF(ia) 

A  probability  model  under  which  the  asymptotic  covariance 
functions  exists  is  the  following:  Y(t),  t=0,  +!>•••  is  a  zero 
mean  Gaussian  covariance  stationary  time  series  with  covariance 
function  R(v)  satisfying  (for  all  t  and  v) 

R(v)  =  E[Y(t+v)  Y(t) ] 

When  the  time  series  is  stationary  and  ergodic,  the  sample 
covariance  function  converges  to  the  covariance  function. 


a 
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A  Gaussian  stationary  time  series  is  ergodic  if  and  only  if 
T 

litn  i  l  R2(v)  -  0 
T  v=l 

It  is  natural  to  classify  a  stationary  time  series  into 
three  classes  according  to  the  rate  of  decay  of  the  correlation 
function  p(v): 

white  noise 
(no  memory) 

ergodic 
(short  memory) 

non- ergodic 
(long  memory) 

One  of  the  aims  of  this  paper  is  to  discuss  the  unifying  role  of 
the  concept  of  memory.  The  foregoing  trichotomy  indicates  that 
there  are  three  types  of  memory  (no,  short,  long).  However  the 
insights  into  model  identification  provided  by  the  notion  of 
memory  are  captured  not  by  definitions  in  terms  of  correlations 
(or  even  partial  correlations)  but  by  definitions  in  terms  of  the 
spectral  density  function  and  sample  spectral  density. 


i  i 

T  J  p2(v)  =  0  for  all  T 


v=l 


i  l  P2(v)  -*•  0  as  T 
1  v=l 


1  T 

f  l  P2(v)  M 
1  v=l 


A. 


- . — -  * 


The  spectral  density  function  f(uj),  -0.5<uj<:0.5  is  defined 
as  the  Fourier  transform  of  the  correlation  function  p(v) : 

f(u>)  =  l  e‘27rivwp(v) 

v=-  CO 

A  sufficient  condition  for  f(u>)  to  exist  as  an  ordinary  function 
is  that  p(v)  is  summable.  A  long  memory  time  series  may  not 
possess  a  spectral  density.  To  be  able  to  use  such  a  function, 
we  introduce  the  sequence  of  approximating  spectral  densities 

iT(w)  *  I  exp  (-27iivm)  p(v)  (1- 

|vf<T  T 

The  correlation  criteria  for  memory  classification  provide 
equivalent  criteria  in  terms  of 

Var  [fT]  =  /°'5  {fT(u)  -  l}2  du,  -  2  \  p(v)  (1  -  -^l)2 

-0.5  v=l 

However  a  more  useful  criterion  is  the  dynamic  range  of  f-pCw)  . 

We  discuss  its  definition  only  for  the  case  that  f(w)  exists. 

A  stationary  time  series  can  have  a  spectral  density  f(u>) 
and  yet  not  be  representable  as  an  autoregressive  process.  One 
needs  to  assume  an  additional  condition  such  as  f(w)  is  bounded 
above  and  below;  for  some  constants  c^  and  C2»  0  <  c-^  <  f(u>)  _< 

C2  <  ®  .  The  dynamic  range  of  f(u>)  is  defined  to  be 


•  {m*x  log  f(u>)  -m^n  log  f  ((!))}  . 

Dynamic  range  classification  of  memory  of  a  time  series: 
no  memory  =  dynamic  range  =  0 

short  memory  =  0  <  dynamic  range  <  °° 

long  memory  =  dynamic  range  =  » 

Often,  zero  frequency  is  the  frequency  at  which  the  spectral 
density  has  a  behavior  causing  it  to  have  infinite  dynamic  range. 
As  the  spectral  density  f(oj)  is  assumed  to  be  a  regularly 

varying  function,  with  the  representation  [called  the  regular 
variation  representation  at  frequency  ca>=0 ] 

f(u)  =  uf  6L(u>) 

where  L  (oj)  is  a  slowly  varying  function.  The  value  of  6  is  an 
index  of  length  of  memory,  since 

No  and  short  memory  =  5=0 

Long  memory  =6/0 

Long  memory  time  series  models  considered  by  Mandelbrodt  (1973), 
Granger  and  Joyeux  (1980) ,  and  Peweke  and  Porter-Hudak  (1983) 
have  spectral  density  f(u>)  satisfying  the  regular  variation 
representation.  The  index  6<0  corresponds  to  a  zero  value  for 


.5 


f(oi)  at  o>=0,  while  6>0  corresponds  to  an  infinite  value  for 
f(ui)  at  oi=0. 

When  6>0,  the  spectral  density  f(oi)  is  an  integrable  function 
only  for  0£6<1;  the  correlation  function  p(v)  decays  slowly  as 

p(v)  'v  v  as  v  -*■  ® 

The  value  at  oi=0  of  f(oi)  can  be  »  and  still  6=0;  this  holds  for 
2 

f(oi)  ^  (logoi)  for  small  oi,  corresponding  to 
,  .  log  v 

p  (v)  %  — 6 —  as  v  -*■  <»  . 

v 

A  symbolic  spectral  density  f(oi)  with  6>1  is  that  of  a  time 
series  Y(-)  whose  first  difference  AY(t)  =  Y(t)  -  Y(t-l)  is 
short  memory  (covariance  stationary  with  spectral  density 
bounded  above  and  below);  then 

fY(w)  ^  ~T  fAY^^ 


and  6=2. 

Parzen  (1983d)  gives  explicit  formulas  for  the  index  6  in 
the  context  of  density-quantile  estimation: 

6  =  lim  log  f(oiy)  dy  -  log  f(oi) 

oi-t-0  ° 

=  lim  —  log  f  ( A)  -  log  f  (oi) 

oi->0  w  ° 
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To  estimate  6  one  forms 

5k  -  E  1®8  f4>  -  log  £<^> 

where  n  and  k  are  integers  tending  to  °°  in  such  a  way  that  k/n 
tends  to  0.  One  can  show  that 

6  =  lim  6.  . 

k-co  K 

k/n-t-0 

A  similar  formula  can  be  used  to  estimate  6  in  a  regular 
variation  representation  of  f(u))  at  a  frequency  :  represent, 
ui  =  m/n  and  define 

«k  -  1  \  log  f(i±S)  -  log  f(M±S)  . 

Examples  of  estimates  of  <5  are  given  in  Section  7. 

We  estimate  the  memory  index  5  from  consistent  estimators 
f (to)  of  the  spectral  density  f.  We  use:  (1)  the  non-parametric 
kernel  spectral  density  estimator 

00 

f((j)  =  l  k(^)pT(v)  exp  -2  Aitov  ,  |to|<0.5 

oo 

with  truncation  point  M  =  (in  practice,  we  use  M  =  T/2) 

and  Parzen  window 
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k(t)  =  1  -  6t^  +  6|t|^,  |tj  <  0.5, 


-  2  (1-  |t|)J  ,  0.5  <  It |  <  1. 


,  otherwise  ; 


and  (2)  autoregressive  spectral  density  estimators. 

Only  examples  can  show  which  values  of  6  occur  in  real 
series.  The  goal  in  estimating  6  is  to  develop  diagnostics 
concerning  the  "detrending'*  operations  to  be  used  to  transform 
a  long  memory  series  to  a  short  memory  time  series.  To  model 
time  series,  Box  and  Jenkins  (1970)  introduced  the  ARIMA(p,d,q) 
model.  Estimation  of  the  parameter  d  can  be  approached  by 
estimating  6.  Estimation  of  p  and  q  can  be  approached  by  diverse 
order  determining  methods  involving  estimating  information. 

Determining  the  degree  of  differencing:  When  a  time  series 
Y(t)  can  be  transformed  to  a  stationary  time  series  Z(t)  by 
differencing  d  times,  one  can  think  of  the  "spectral  density” 
fy(w)  of  Y(-)  as  having  the  representation 


fy  (u>) 


1 1  -  27riw  ,  -2d 

|l-e  | 


f  ^  (w) 


which  is  a  special  case  of  assuming  that  fy(u>)  is  regularly 
varying  at  w=0  with  index  6=2d.  The  foregoing  estimators  for  6 
may  provide  alternatives  to  the  techniques  for  estimating  d 
which  have  been  proposed  by  Granger  and  Joyeux  (1980),  Janacek 
(1982) ,  and  Geweke  and  Porter-Hudak  (1983) . 
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5 .  ARMA  models  and  prediction  error  memory  classification 
The  concept  of  an  autoregressive  process  was  introduced 
by  Yule  (1927)  as  an  alternative  technique  for  detecing  hidden 
periodicities,  and  estimation  of  the  frequency  to  in  the  time 
series  model 

Y(t)  =  A  cos  2-rcojt  +  B  sin  2 nut  +  e(t) 

where  e(*)  is  white  noise.  The  function  cos  27Tut  satisfies  the 
second  order  difference  equation 

Y(t)  +  aL  Y(t-l)  +  a2  Y(t-2)  =  0 

with  a^  =  -2  cos  2 iuu  and  a2  =  1 .  Yule  suggested  determining 
coefficients  a^  and  a2  minimizing 

T 

l  (Y(t)  +  a,  Y(t-l)  +  a,  Y(t-w)}2 
t=l  1 

These  coefficients  may  be  interpreted  as  estimators  of  the 
parameters  in  the  "random  shock"  model 

Y(t)  +  ax  Y(t-l)  +  a2  Y(t-2)  -  e(t) 

where  e(t)  is  white  noise.  Thus  was  born  the  AR(2)  model. 


Autoregressive  (AR) ,  moving  average  (MA) ,  and  autoregressive- 
moving  average  schemes  (ARMA)  now  play  a  central  role  in  time 
series  analysis,  since  they  provide  basic  models  for  time  series 
model  identification,  forecasting,  and  spectral  estimation. 

One  definition  of  an  ARMA(p,q)  model  for  a  zero  mean 
covariance  stationary  time  series  Y(t),  t=0,  +1,  ...  is 

Y(t)  4-  a  (1)  Y(t-l)  4-  ...4-  ap(p)  Y(t-p) 

=  c (t)  4-  bq(l)  e(t-l)  4-.  .  . 4-  bq (q)  e(t-q) 

where  c(t)  is  a  white  noise  time  series,  and  the  transfer 
functions 


gp(z)  =14-  ap  ( 1 )  z4- .  .  .4-  ap(p)  zP  , 

hq(z)  =  1  4-  bq<l)  z  +.  .  .4-  bq(q)  zq 

have  all  their  roots  in  the  complex  z-plane  in  the  region  j  z  |  > 1 . 
For  the  backward  shift  operator  B  we  use  the  lag  operator  L, 
defined  by  LY(t)  =  Y(t=l).  An  ARMA(p,q)  model  is  written 

Sp(L)  Y (t)  =  hq(L)  e  (t) 


An  AR(°°)  model  is  expressed 
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6cd(L)  Y(t)  =  e(t) 


An  MA(°°)  model  is  expressed 


Y(t)  =  hJL)  e  (t) 


A  model  for  a  stationary  time  series  is  an  invertible 
filter  which  transforms  it  to  white  noise.  For  a  short  memory 
time  series,  the  whitening  filters  can  always  be  represented  as 
AR(°°)  or  MAO)  and  are  approximated  by  ARMA(p,q)  of  suitable 
orders  to  be  estimated.  The  white  noise  e(t)  to  which  we  seek 
to  transform  a  time  series  Y(t)  are  the  infinite  memory  one 
step  ahead  prediction  errors  (innovations)  Yv(t)  =  Y(t)-Y,J(t), 
where 

Yu(t)  =  E[Y(t) | Y(t-l) , . . .  ] 

The  white  noise  sequence  Yv(t)  has  mean  0  and  variance  o^R(0) , 

where 

o*  =  E[ | Yv(t) | 2]  r  R(0) ,  R(0)  =  E[ |Y(t) |2] 

We  call  the  normalized  mean  square  prediction  error,  of  one- 
step  ahead  infinite  memory  prediction.  The  importance  of 
normalization  (which  may  not  currently  be  standard  practice  for 
all  time  series  analysts)  is  emphasized  by  the  information  theory 
approach  in  the  next  section.  A  basic  diagnostic  tool  is  the 
memory  m  normalized  mean  square  prediction  errors 
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=  E[ |Yv,m(t) |2]  t  R(0), 

Yv,m(t)  =  Y(t)  -  Yy,m(t) 

-Yy,m(t)  =  affl(l)  Y(t-l)  +...+  am(m)  Y(t-m) 

Given  a  true  (or  sample)  correlation  function  p(v),  one  can 
compute  (using  the  Yule-Walker  equations)  the  sequence  which 
converges  monotonely  to  the  limit  o^.  An  alternative  approach 
to  computing  is  the  fundamental  formula 

log  a l  =  /*  log  f(oj)  doj  . 

The  value  of  is  a  very  useful  diagnostic  measure  of  the  memory 
of  a  time  series. 

Memory  classification  by  Normalized  Mean  Square  Prediction  Error 
no  memory  =  =  0 

short  memory  =  0  <  <  00 

long  memory  =  =  1 . 


The  estimation  of  o2  is  one  of  the  basic  problems  of  time 

00 

series  model  identification.  One  important  method  is 


a2 

m 


where  m  is  chosen  by  an  order-determining  criterion  (AIC  due  to 
Akaike  or  CAT  due  to  Parzen) .  The  pioneering  work  of  Akaike  (1974)  , 
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(1977)  has  shown  the  central  role  of  information  theoretic 
ideas  in  defining  these  criteria. 

The  next  section  discusses  how  to  use  information 
divergence  ideas  to  measure  the  ability  of  ARMA(p,q)  schemes 
to  provide  approximating  models  to  the  exact  models  (of  a 
short  memory  time  series)  provided  by  AR(°°)  and  MA(°°) 
representations . 
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6 .  Information  approach  to  memory  and  ARMA  schemes 

Information  divergence  of  a  probability  density  g  from  a 
(true)  probability  density  f  is  defined  by 


I(f;g)  =  /“(-log  f(y)  dy 


Ty) 


Information  has  an  important  decomposition 


I (f ; g)  =  H(f ; g)  -  H (f ) 


defining  cross-entropy  H(f;g)  and  entropy  H(f)  by 


H (f ; g)  =  /“{-log  g (y) }  f (y)  dy 


H(f )  =  H  (f ;  f )  =  /“  {-log  f  (y)  }  f(y)  dy 


The  information  I(Y|X)  about  a  continuous  random  variable 
Y  in  a  continuous  random  vector  X  is  defined  by 


I(Y|X)  =  I(fY|x;  fY>  =  EX  I(fY|X;  fY> 


The  entropy  of  Y  and  conditional  entropy  of  Y  given  X  are 
defined  by 


H(Y)  =  H(fy) 


U* 
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H(Y|X)  =  H(fYjx)  =  ExH(fYjx=x)  . 
One  can  establish  a  fundamental  decomposition 
I(YjX)  =  H(Y)  -  H(Y|X) 


Define  the  information  about  Y  in  X2  conditioned  on  X^  by 
I(Y|Xi;  Xl,X2)  -  H(fY|Xi)  -  H(fv|Xi>X2) 

-  H(Y|Xl)-  H(Y|Xl,X2) 

A  fundamental  formula  to  evaluate  an  information  increment  is 

I(Y|XL;XlfX2)  =  I(Y|X1,X2)  -  I(Y|X1) 

When  X  and  Y  are  jointly  normal  random  variables,  let  E(Y) 
denote  the  variance  of  Y  and  E(Y|X)  the  conditional  variance  of 
Y  given  X  (which  does  not  depend  on  the  value  of  X) .  Then 

H(Y)  =  \  log  E(Y)  +  \  (1  +  log  2tt) 

11  (Y  |  X)  =  \  iog  E(Y|X)  +  \  (1  +  log  2  it) 

I(Y|X)  =  -  j  log  2_1(Y)  E(Y[X) 
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A  general  approach  to  memory  uses  information  in  the 
infinite  past  about  the  current  value,  defined  by 


I  =  lim  I 
a>  m 

nv+oo 


Im  =  I(y(nH-l)  |Y(1) . Y(m)) 

Information  Definition  of  Memory.  We  define  a  time  series 
Y(t),  t=0,  +1,...  to  be 


no  memory 
short  memory 
long  memory 


1=0 

oo 

0  <  I  <  oo 
00 

T  =  00 

oo 


This  definition  agrees  with  the  criterion  in  the  previous 
section  in  terms  of  a ^  since  for  a  stationary  Gaussian  time 
series  I  =  -  i  log  a2. 

Example.  A  random  walk  has  long  memory  and  white  noise  has 
no  memory. 

A  random  walk  is  defined  by  Y(nrt-l)  =  Y  (m)  +  c  (nri-1)  ,  Y(0)  =  0, 
where  e(t)  are  independent  N(0,o2),  E(Y(m+l))  =  (nrt-1)  a2, 

E[Y(m+l) |Y(1)  , . . , Y (m) }  =  Y(m),  E  (Y(mfl)  |  Y(l)  1 . Y(m))  =  o2, 

Im  =  j  l°g  (m+1)  ,  =°°.  A  purQ  white  noise  is  defined  by 

Y(m)  =  e  (m)  .  Then  E(Y(mfl))=af  E[Y(m+l)  |  Y(l) . Y(m)  ]  =  0, 

E  (Y(nrt-l)  |  Y(l) . Y(m))  =  o2,  Im  =  0 ,  =  0. 


_ 
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Both  a  random  walk  and  a  pure  white  noise  can  be  regarded 
as  special  cases  [corresponding  to  p=l  and  p=  0  respectively] 
of  the  AR(1)  model 

Y (t)  =  pY(t-l)  +  e  (t) ,  t=l, 2, . . . 

where  e(t)  are  independent  N(0,o2).  When  |p|  <  1,  an  AR(1) 
defines  a  stationary  (or  asymptotically  stationary)  time  series 
satisfying 

I.  -  -  \  log  <!-»’>■ 


In  order  to 

transform  one' 

s  thinking  about 

AR(1) 

models 

from 

p  to 

I  one 

oo 

needs 

a  table 

of  corresponding 

values 

of  these 

parameters . 

P 

.1 

.2 

.  3 

.4  .5  .6 

.  7 

.8 

.9 

.95 

I 

00 

.005 

.020 

.047 

.087  .144  .223 

.337 

.511 

.830 

1.16 

I 

CO 

.25 

.5 

.75  1 

.0  1.25  1.50 

1.75 

2 

3 

4 

p 

.627 

.  795 

.881 

.930  .958  .975 

.985 

.991 

.999 

.  9996 

A  very  quick  and  dirty  rule  for  memory  diagnosis  is  to  regard  an 
observed  value  of  1.5  as  an  early  detector  of  very  long 

memory,  and  1^  >_  1.00  as  an  early  detector  of  long  memory. 

This  rule  is  to  be  used  in  conjunction  with  other  rules  for 
discriminating  memory  type  which  are  given  in  Section  7. 
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We  next  discuss  how  to  interpret  an  ARMA(p,q)  scheme  in 

terms  of  information.  Let  I  „  =  I(Y|Y  , , . . . ,Y  ,  Yv, , . . .Yv  ) 

p,q  1  -1*  ’  -p ’  -1*  -qy 

denote  the  information  about  Y(t)  in  Y(t-l) , . . . , Y (t-p) , 

Yv(t-1) , . . . , Yv(t-q) .  For  a  Gaussian  stationary  short  memory 
time  series 

I  =  -  i  log  a2 
p,q  2  &  p,q 

where 
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Formulas  for  I  qioo  are  most  conveniently  developed  in 
terms  of  the  coefficients  •••  °f  the  MA(»)  representation 

of  a  time  series: 

Y(t)  =  Yv(t)  +  Yv(t-1)  +  ... 

There  are  two  methods  for  estimating  the  MA(°°)  coefficients; 
invert  AR(  m)  where  m  is  chosen  by  an  order- determining 
criterion,  or  derive  8^  from  estimators  of  (the  cepstral 
pseudo- correlations) 

0.5 

to(v)  =  /  exp  (2Trivo))  log  f(w)  do) 

-0.5 

In  the  Gaussian  case,  information  is  (up  to  a  constant) 
the  logarithm  of  variance.  It  may  seem  that  there  is  no 
reason  to  prefer  information  to  variance.  However  information 
concepts  are  meaningful  even  for  non-Gaussian  series  (although  ( 

they  have  not  yet  been  extensively  calculated  in  the  non-Gaussian 
case).  Thus  by  translating  variance  into  information,  one  can 
eventually  transfer  one's  Gaussian  intuition  to  non-Gaussian 
data  analysis. 

To  illustrate  the  use  of  information  in  model  identification, 
let  us  consider  the  loss  one  sustains  in  using  the  best  fitting 
AR(2)  model  when  the  true  model  is  an  ARMA(1,1) 

Y(t)  +  a  Y(t-l)  =  £ (t)  +  b  e(t-l) 


.  •  .a. 
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One  can  compute  o£,  p(l),  p ( 2 )  in  terms  of  a  and  b.  The  values 
of  p(l)  and  p(2)  determine  (via  the  Yule-Walker  equations)  the 

A  A  A 

optimal  values  o2,  32(1),  a2  (2).  When  a  =  -.5,  b  -  .5,  one 
obtains  =  .4286,  p(l)  *  .7143,  p(2)  =  .3571;  a\  -  .4418, 
a2(l)  =  -.9378,  a2(2)  =  .3126.  The  information  loss  in  using 
the  approximating  AR(2)  model 

Y(t)  -  .9378  Y(t-l)  +  .3126  Y(t-2)  «  e(t) 

rather  than  the  exact  ARMA(1,1)  with  -a=b=.5  is  .015,  since 
I(YJ  Y_1,Y_2 ;  Y")  =  {-  \  log  a2J  -  {-  £  log  o*} 

=  .4236  -  .4084  =  . 015 

Estimating  MA(°°)  is  also  a  prerequisite  to  using  another 
criterion  that  we  use  to  estimate  memory:  the  Prediction  Variance 
Horizon  function,  introduced  in  Parzen  (1981) .  It  provides  a 
quantitative  method  of  measuring  memory  (especially  medium 
memory)  by  HORIZON,  defined  as  the  smallest  value  of  h  for  which 

1  +  62(1)+. . .+6 2 (h-1)  >  o  95 
1  +  82  (1)  +.  .  . 

The  left  hand  side  of  the  above  inequality  can  be  interpreted 
as  representing  the  mean  square  error  of  prediction  h  steps 
ahead. 


30 


7.  Quantile  based  time  series  diagnostics,  and  their 
representative  values 

This  section  introduces  various  quantile  based  time  series 
diagnostic  measures.  Their  use  can  be  considered  exploratory 
data  analysis  since  they  require  no  theory  for  interpretation  if 
one  is  willing  to  base  one’s  conclusions  on  the  empirically 
observed  values  of  the  criteria  for  representative  time  series. 
On  the  other  hand,  the  criteria  are  based  on  clearly  stated 
concepts  of  probability  theory,  and  one  could  study  theoretially 
the  distribution  of  the  criteria  for  various  time  series  models. 

Quantile  diagnostics  of  normality  of  data.  A  diagnostic 
measure  of  the  shape  of  a  distribution  is  the  log  standard 
deviation  of  the  informative  quantile  function,  denoted  LNSDIQ, 
and  defined  by 


LNSDIQ  =  log 


standard  deviation  of  original  data 


twice  interquartile  range 


For  a  normal  distribution,  interquartile  range  equals  1.35 
standard  deviation;  therefore  LNSDIQ  =  -  log  2.7  =  -1 
approximately.  We  can  regard  a  significant  difference  of 
LNSDIQ  from  -1  as  an  indication  that  the  probability  distribution 
of  the  data  is  not  normal  (Gaussian) .  A  more  formal  test  of 
normality  is  to  compare  LNSDIQ  with  LNSGMO  =  log  oQ,  where 


oQ  =  /q  4>-1(u)  IQ  (u)  du 
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is  the  score  deviation  (an  efficient  estimator  of  a  for  a 
normal  distribution,  obtained  as  a  linear  combination  of  order 
statistics).  This  test  (analogous  to  the  Shapiro-Wilk  test  for 
normality)  requires  further  theory  as  we  find  examples  in  which 
the  data  have  IQ(u)  plots  that  are  not  normal  (confirmed  by 
LNSDIQ  different  from  -1),  yet  LNSDIQ  and  LNSGMO  are  not 


different . 

To  decide  whether  data  is  normal,  the  entire  graph  of  the 
informative  quantile  [IQ(u)]  function  should  be  examined. 
However  an  early  detector  of  the  shape  is  provided  by  the 
value  of  LNSDIQ  as  is  indicated  by  the  following  empirical 


values : 

LNSDIQ 

A 

Variable 

Cauchy  white  noise 

0 

-1.14 

Airlines  log  monthly 

1.38 

-1.14 

NYC  Monthly  Births 

.93 

-1.24 

Lines  +  Noise 

1.72 

-1.34 

Cauchy  random  walk 

1.48 

-1.34 

NYC  Monthly  Temperature 

1.17 

-1.32 

Normal  random  walk 

1.11 

In  the 

tables  in  this  section,  I  = 

oo 

-  j  log 

is  estimated 

0 

/v 

by  1^  for  the  approximating  AR(m)  scheme,  where  the  order  m  is 
determined  by  the  AIC  criterion  (or  equally  the  CAT  criterion). 

Periodogram.  For  a  white  noise  time  series  whose  random 
variables  have  finite  second  moment,  the  quantile  function  of 
the  periodogram  should  be  that  of  an  exponential  distribution  with 
mean  1.  A  test  of  white  noise  is  provided  by  examining  IQ(u) 


— ■ 


the  median  and  variance  of  the  periodogram.  For  white  noise 


Periodogram  median  =  log  2  =  .69  , 

Periodogram  variance  =  1. 

As  memory  increases,  per.  median  decreases  and  per.  variance 
increases,  as  the  following  empirical  results  confirm  [the  va 
for  AR(1)  processes  are  based  on  the  table  "Quantile  Memory 
Analysis  of  Simulated  AR(1)"  in  the  Appendix]. 


Periodogram  median 

.89  Cauchy  white  noise 

.7  Normal  white  noise 

.2  Normal  AR(1) ,  p  =  .8 

.08  Normal  AR(1) ,  p  =  .9 

.02  Normal  AR(1),  p  =  .99 

.08  NYC  Births  Monthly 

.06  NYC  Temperatures  Monthly 

.04  Normal  random  walk 

.03  Airlines  log  monthly 

.03  Cauchy  random  wlak 

.02  Lines  plus  noise 


Periodogram  variance 

i .  i  -i  .  -  ■  — 


67.7 

Lines  plus  noise 

49.8 

NYC  Temperatures  Monthly 

41.5 

Normal  random  walk 

38.3 

Cauchy  random  walk 

39.7 

Airline  log  monthly 

33.1 

NYC  Births  monthly 

42. 

Normal  AR(1) ,  p  =  .99 

22. 

Normal  AR(1) ,  p  =  .9 

1 

Normal  white  noise 

.5 

Cauchy  white  noise 
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Correlations .  As  a  memory  diagnostic,  we  use  correlations 

a 

mean  square  of  sample  correlation  p(v)  -  pip(v),  v=l,2,..., 


1 

N 


P2  (v) 


computed  for  a  large  value  of  N.  It  is  zero  for  white  noise,  an 
increases  with  memory.  Some  empirical  values  are: 


.002  Cauchy  white  noise 

.004  Normal  white  noise 

.01  Normal  AR(1) ,  p  =  .7 

.1  Normal  AR(1) ,  p  =  .9 

.2  Normal  AR(2) ,  p  =  .99 

.  14  NYC  Births  monthly 

.  18  Normal  random  wlak 

.  17  Cauchy  random  walk 

.19  Airlines  log  monthly 

.23  Line  plus  noise 

.26  NYC  Temperatures  monthly 


Delta  estimators.  A  conclusion  that  a  time  series  is  long 
memory  is  regarded  by  us  as  valid  only  when  it  is  confirmed  by 
the  behavior  of  the  sequence  of  estimators  6^  of  the  memory 
index  6.  We  routinely  form  these  estimators  at  u)=0  and  u>=l/12. 
Note  that  1/12  is  the  period  of  an  annual  cycle  in  monthly  data; 
the  program  permits  the  specification  of  any  other  seasonal 
frequency.  Two  sequence  of  estimators  6^  are  formed;  from  the 
best  approximating  AR  scheme,  and  from  Parzen  window  eatimators 
with  truncation  point  approximately  equal  to  T/2,  where  T  is 
the  time  series  sample  size  [the  time  series  examined  had 
T=144  to  200]. 


Our  "estimator"  6  is  currently  only  a  summary  of  the 
behavior  of  the  sequences  6^,  indicating  a  value  about 
which  there  is  clustering.  For  normal  AR(1)  schemes  at  co=0 
the  following  typical  values  were  found  in  simulated  series. 


approximate  6 

2 

1.5 

1 

when  I 

00 

1.75,  2 

1.25,  1.50 

1 

p 

.99 

.96 

.93 

For  empirical  series  we 

observed  the 

following 

A 

estimators  6: 

0)  = 

0 

CO 

=  1/12 

Best 

Parzen 

Best 

Parzen 

AR 

window 

AR 

window 

Lines  +  Noise 

1.98 

2.22 

.38 

.51 

Cauchy  random  walk 

1.84 

1.84 

.37 

.48 

Airlines  log  monthly 

2.33 

2.22 

1.56 

1.42 

NYC  Temperatures  Monthly 

-.4 

-.8 

2.1 

2.6 

NYC  Births  Monthly 

2.05 

1.74 

1.12 

.77 

Note  that  a  negative  value  of  <5  at  co=0  indicates  the 
possibility  that  the  spectral  density  f  (to)  is  zero  at  co=0. 

Partial  correlations.  The  sequence  of  partial  correlations 
are  usually  used  to  diagnose  if  the  time  series  obeys  an 
autoregressive  scheme,  since  AR(p)  is  equivalent  to  partial 
correlations  equal  to  0  for  orders  greater  than  p.  The  quantile 
function  of  partial  correlations  then  should  look  like  white 
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noise  plus  as  many  outliers  as  the  order  of  the  scheme.  As 
diagnostic  measures  of  memory  we  compute: 

PCIQR  =  interquartile  range  of  the  quantile 
function  of  partial  correlations; 

PCLNSD  =  log  standard  deviation  of  the  informative 
quantile  function  IQ(u)  of  partial 
autocorrelations ; 

PCOUT  =  number  of  partial  correlations  greater  in 
absolute  value  than  twice  interquartile 
range,  number  of  values  of  u  at  which 
|IQ(u)|  >  1. 

Typical  values  of  these  measures  for  representative  time  series 
will  be  published  elsewhere. 


8.  ARSPIQ  analysis  of  simulated  long  memory  series 

To  illustrate  their  research  on  long  memory  time  series 
models,  Granger  and  Joyeux  (1980)  generated  series  of  the  form 


(I-L)dY(t)  =  e  (t) 


with  spectral  density  (for  some  constant  c) 


fv  (w)  =  C(1  -  COS  2lT(i)) 


This  spectral  density  is  regularly  varying  at  w=0  with  memory 
index  S=2d.  They  generated  two  series  of  length  400, 
corresponding  to  d  =  .25  (6*. 5)  and  d  =  .45  (6=.9).  We  call 
these  series  White  6.5  and  White  6.9  respectively.  I  would 
like  to  thank  Clive  Granger  and  Roselyne  Joyeux  for  having  given 
us  copies  of  their  series  to  study.  Some  of  the  diagnostics 
generated  by  ARSPIQ  are  as  follows: 


White  6.5 

White  6.9 

DATA  LNSDIQ 

-.95 

-1.03 

DATA  LNSGMO 

-.95 

-1.03 

Variance  Periodogram 

6.9 

10.9 

Median  Periodogram 

.54 

.  30 

Correlation  Mean  Square 

Delta  Estimator  w=0 

.02 

.03 

Best  AR 

0.9 

1.0 

Parzen  Window 

0.6 

1.2 

AIC  order  ft 

7 

4 

1  ~  2 

~  ~  Y  °m 

.  14 

.35 

Prediction  Variance  Horizon 

24 

20 
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Comparing  these  diagnostics  with  the  values  obtained  for 
various  series  in  Section  7,  we  might  conclude  the  following 
characteristics  for  the  series. 


Data  LNSDIQ,  LNSGMO 
Corr.  Mean  Square 
Periodogram,  Var 
Periodogram,  Median 

*00 

Pred.  Var.  Hor. 
Delta  w=0 


Normal 

Short  memory 
Short  memory 
Short  memory 
Short  memory 
Medium  memory 
Long  memory 


Printer  plots  of  delta  estimators  are  given  in  Figures  5, 
6,  11,  12.  One  does  not  currently  get  an  exact  numerical 
estimate  of  6.  But  the  values  estimated  for  6  are  consistent 
wiht  the  theoretical  values  of  <5  used  in  generating  the  time 
series.  On  the  basis  of  the  foregoing  diagnostics,  one  would 
be  justified  in  recommending  a  fractional  differencing  of  the 
time  series,  using  a  rough  estimate  of  6. 

If  one  fitted  an  ARMA  model  to  these  series  one  might  be 
tempted  to  fit  ARMA(1,1)  models:  for  white  6.5, 

Y (t)  -  .75  Y(t-l)  =  e  (t)  -  . 47  e(t-l)  ; 

for  white  6.9, 

Y(t)  -  .89  Y(t-l)  =  e  (t)  -  .44  e(t-l) 


* 


&:*  *ix*.  '  4 
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By  comparing  the  spectral  distribution  function  of  these  ARMA 
schemes  with  the  cumulative  periodogram  one  would  see  that  the 
ARMA  models  inadequately  modeled  the  low  frequency  portion  of 
the  spectral  distribution  function. 

The  question  is  open  whether  expect  practictioners  of  purely 
time  domain  ARMA  or  ARIMA  methods  of  time  series  analysis  could 
identify  the  model  generating  the  series  simulated  by  Granger  and 
Joyeux. 


9. 


Does  the  airline  data  fit  the  airline  model? 


The  aim  of  time  series  modeling  is  to  find  a  filter  that 
transforms  the  time  series  to  white  nosie.  A  possible  model 
identification  procedure  is  to  guess  a  model,  estimate  its 
parameters,  form  the  residuals,  and  test  if  the  residuals  are  not 
significantly  different  from  white  noise.  This  procedure  in 
practice  may  lead  two  different  analysts  to  infer  two  different 
models.  The  question  is  open  how  to  resolve  which  model  to 
accept  (which  model  is  "better") .  The  concept  of  memory  seems  to 
provide  a  characteristic  of  a  time  series  which  can  be  estimated 
non-parametrically .  Statisticians  must  decide  whether  to 
accept  as  a  model  fitting  criterion  the  following:  a  model 
fitted  to  a  time  series  must  satisfy  the  criterion  that  its 
memory  characteristics  agree  with  those  estimated  from  the 
data. 

The  operation  of  this  criterion  can  be  illustrated  by  a 
classic  series  used  as  a  test  case  by  researchers  on  time  series 
model  identification  methods  —  log  international  airlines 
passengers  series.  The  model  fitted  by  Box  and  Jenkins  (1970) 
to  this  series  has  become  celebrated  as  the  "airline  model".  It 
takes  1st  and  12th  differences  of  the  series  Y(t)  to  form  a  short 
memory  time  series  Y(t): 

(I-L)(I-L12)  Y ( t)  =  Y ( t) ; 

Y(*)  is  modeled  as  a  special  form  of  MA(12) : 
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Y(t)  =  (I-e^)  (I-612L12)  e (t)  . 


Parzen  (1982)  has  suggested  that  12th  differences  might  suffice 
as  an  operation  which  transforms  the  original  series  (which  has 
long  memory)  to  a  new  series  which  is  just  barely  short  memory. 
The  diagnostics  in  the  table  [which  one  interprets  by  comparing 
them  with  the  representative  values  in  Section  7]  indicate 


that  12th  differencing 

does  suffice  to 

yield  short  memory. 

Log  Airline 

Log  Airline 

12th  difference 

Data  LNSDIQ 

-1.15 

-  .97 

Data  LNSGMO 

-1.16 

-.97 

Periodogram  Median 

.03 

.19 

Periodogram  Variance 

39.7 

7.7 

Correlation  Mean  Sq. 

.19 

.05 

Delta  Estimate  u>=0 

Best  AR 

2.33 

0 

Parzen  Window 

2.22 

0 

Delta  Estimate  w  =  1/12 
Best  AR 

1.56 

0 

Parzen  Window 

1.42 

0 

I®  =  ‘  I  lo8  am 

1.38 

.5 

Prediction  variance  horizon  51 


66+ 


/s  .  /> 

Note  on  how  we  form  the  estimator  6:  we  write  6=0  to 

A 

indicate  that  sequence  6^  oscillates  between  negative  and 
positive  values.  Negative  values  could  indicate  6<0  and  presence 
of  a  zero  of  the  spectral  density.  In  our  current  state  of 

A 

knowledge  we  assign  a  value  to  6  representing  essentially  flat 
behavior  of  6^.  If  the  12th  difference  spectral  density  had  a 
zero  at  u)=0  or  ui/ 1/12,  we  would  suspect  that  we  had  over- differenced. 


A  quantitative  measure  of  memory  is  the  prediction  variance 

horizon  [51  for  airline,  >66  for  12th  difference];  one  concludes 

that  differencing  the  time  series  still  has  significant  trend 

components  (long  memory) .  The  ARARMA  modeling  procedure  of 

Parzen  (1982)  finds  that  if  one  transforms  the  airline  series  by 

12  12 

the  operator  I  -  1.02L  rather  than  by  I  -  L  ,  one  does  obtain 
a  time  series  which  is  unequivocably  short  memory. 


10 .  ARSPIQ  Analysis  of  12th  difference  of  white  noise 

The  ability  of  ARSPIQ  to  identify  time  series  models  may 
be  well  illustrated  by  an  analysis  of  a  simulated  time  series 

Y(t)  =  e (t)  -  e (t-12) , 

where  e(t)  is  N(0,1)  white  noise.  A  sample  of  size  T=200  was 
simulated.  It  had  mean  .02,  median  .01,  variance  2.16.  The 
DATA  diagnostics  LNSDIQ  =  -1.04,  LNSGMO  =  -1.04  indicate  that 
the  data  is  normal . 

The  diagnostics 


Periodogram  median 

.38 

Periodogram  variance 

2.63 

Correlation  mean  square 

.01 

Best  AR  order  m 

24 

i--  -  2  h 

.27 

indicate  that  the  time  series  is  short  memory.  But  the  AR 
spectral  density  estimator  does  not  perform  well. 

The  delta  diagnostics  indicate  that  the  time  series  is 
long  memory.  That  the  spectral  density  has  zeroes  at  frequencies 
oj=0  and  to=  1/12  is  indicated  by  significantly  negative  values  of  6: 

Delta  estimate _ w=0 _ w  °  1/12 

Best  AR(m  -  24)  -1.9  -1.2 


Parzen  window 


-1.6 


.9 
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To  estimate  prediction  variance  horizon  [and  an  ARMA  scheme 
by  select  regression  on  the  covariance  matrix  of  Y(t-j),  Y  (t-k)J 
we  fit  an  MA(»)  by  inverting  an  AR(96)  whose  coefficients  are 
computed  by  a  Burg  algorithm;  it  estimates  1^  =  .63,  prediction 
horizon  >  100,  and  chooses  the  model 

Y (t)  +  .41  Y(t- 12)  =  e (t)  -  .55  e(t-12). 

This  ARMA  spectral  density  has  exactly  the  shape  of  the  true 
spectral  density  of  Y(*)- 
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11 .  Quantile  graphics  printer  plots  illustrated 

The  printer  plot  graphical  output  generated  by  ARSPIQ  is 
illustrated  for  the  long  memory  simulated  series  White  6.5 
and  White  6.9  which  are  respectively  labelled  J0Y1  and  J0Y2 
on  the  attached  output. 

Informative  quantile  function  of  the  original  time  series 
J0Y1  and  J0Y2  are  plotted  in  Figure  1  and  7  respectively  (with 
letters  0  and  M) ;  IQ(u)  plots  indicate  normality,  confirmed  by 
D(u)  plots  in  Figures  2  and  8. 

Informative  quantile  function  of  the  periodogram  of  time 
series  J0Y1  and  J0Y2  are  plotted  in  Figures  3  and  9 
respectively;  they  are  not  exactly  exponential,  as  is  confirmed 
by  D(u)  plots  in  Figures  4  and  10. 

The  index  6  of  regular  variation  of  the  spectral  density  at 
zero  frequency  is  estimated  by  the  "limit”  of  the  sequence  6^. 
plotted  in  Figures  5  and  11  (using  AR  spectral  density  estimator) 
and  Figures  6  and  12  (using  Parzen  window  spectral  density 
estimator).  In  Figure  5,  a  limit  exists  which  is  approximately 
0.9;  in  figure  6,  one  may  assign  a  limit  value  of  approximately 
0.6.  In  figure  11  the  limit  is  assigned  to  be  approximately  1; 
in  figure  12,  the  limit  is  assigned  to  be  approximately  1.2. 

Figures  13  and  14  represent  covariances  of  the  time  series 
Y(t)  and  its  innovations  e(t)  =  Yv(t)  estimated  for  input  into 
the  "ARMA  identification  by  select  regression"  procedure.  The 
last  column  is  Prediction  Variance  Horizon  function. 
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12.  Concluding  Remarks 

It  is  important  to  understand  the  role  of  memory  when 
using  [for  time  series  model  identification]  ARIMA  (p,d,q) 
models  introduced  by  Box  and  Jenkins  (1970).  Memory  is  related 
to  d,  but  not  to  the  orders  p  and  q.  Ah  AR(1)  process  Y(t) 
satisfying  g^ (L)  Y(t)  =  e (t)  where  g^(z)  =  1-pz  is  diagnosed  as 
long  memory  when  the  transfer  function  g^(z)  has  its  root  1/p 
close  to  the  unit  circle  in  the  complex  z-plane.  An  example 
of  a  long  memory  population  correlation  function  is  p(v)  = 
cos  2iuot,  which  can  be  regarded  as  corresponding  to  an  AR(2) 
scheme  whose  transfer  function  g2(z)  =  1- (2  cos  2ttw)z  +  z2 
has  roots  on  the  unit  circle.  In  the  ARSPIQ  approach  to  time 
series  model  identification,  roots  are'  not  explicitly  evaluated 
because  their  role  is  subsumed  by  memory. 

The  models  automatically  identified  by  ARSPIQ  have  been 
found  in  practice  to  have  the  same  quality  as  exact  models  for 
purposes  of  forecasting  and  spectral  estimation.  Other 
diagnostics  of  model  structure  (such  as  correlations,  partial 
correlations,  and  inverse  correlations)  are  also  generated  in 
ARSPIQ  and  can  be  used  in  traditional  ways  to  guess  model 
structure. 

There  are  still  many  open  problems  in  the  theory  of  time 
series  model  identification,  such  as  tests  to  determine  which  of 
several  possible  models  fits  best.  FUN.STAT  (statistical 
reasoning  based  on  quantiles,  entropy  and  information,  and 
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The  ARSPIQ  Fortran  Computer  Program  for  Time  Series  Model 
Identification  by  estimating  information  and  memory  is  used  at 
Texas  A&M  in  a  batch  mode.  It  generates  the  following  output 
for  examination  by  the  time  series  analyst. 

1.  Quantile  data  analysis  of  original  data:  ~IQ(u) 

Goodness  of  fit  of  normal  distribution:  D(u) . 

LNSQID,  LNSGMO 

Generates  time  series  Y(t)  with  median  subtracted 

2.  Quantile  data  analysis  of  normalized  periodogram:  IQ(u) 
Goodness  of  fit  of  exponential  distribution:  S(u) 

Median  periodogram,  variance  periodogram 

Delta  estimates  at  zero  and  seasonal  frequencies  (based 
on  periodogram,  usually  no  limit  evident) . 

3.  Quantile  data  analysis  of  correlations:  IQ(u) 

Goodness  of  fit  of  normal  distribution:  D(u) 

Correlation  mean  square 

4.  Quantile  data  analysis  of  partial  correlations:  IQ(u) 
Goodness  of  fit  of  normal  distribution:  D(u) 

Partial  correlation  inter-quartile  range,  number  of  outliers 

5.  AR  Description  of  time  series:  ^AIC,  CAT  orders 

AR  coefficients  for  best  order  m  and  2nd  best  order 
AR  spectral  density  and  spectral  distribution  plots 

6.  AR  spectral  density  delta  estimators  at  zero  and  seasonal 

frequencies 

Parzen  window  spectral  density  delta  estimators 

7.  MA(°°)  estimation 

AR  coefficients  for  order  4m,  computing  partial  correlations 
by  non-stationary  AR  (Burg)  method,  or  optionally  by 
stationary  AR(Yule-Walker)  method 
Inverse  correlations 

Infinite  MA  coefficients ,  prediction  variance  horizon 

8.  ARMA  model  identification  by  select  regression 

ARMA  spectral  density  and  spectral  distribution  plots. 

9.  Cepstral  pseudo-correlation  estimation. 

10.  Spectral  local  quantile  estimation. 
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Cepstral  pseudo-correlation  estimation. 
Spectral  local  quantile  estimation. 


